Data augmentation for speech separation

نویسندگان

چکیده

Deep learning models have advanced the state of art monaural speech separation. However, performance a separation model considerably decreases when tested on unseen speakers and noisy conditions. Separation trained with data augmentation generalize better to In this paper, we conduct comprehensive survey techniques apply them improve generalization time-domain models. The include seven source-preserving approaches (Gaussian noise, Gain, Time masking, frequency Short stretch, Pitch shift) three non-source preserving (Dynamix mixing, Mixup, Cutmix). After hyperparameter search for each method, test augmented by cross-corpus testing datasets (LibriMix, TIMIT, VCTK), identify best combination that enhances generalization. Experimental results indicate several strategies (CutMix, Dynamic mixing) resulted in performance. Finally, combinations also improved even fewer training are available.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two-Stage Data Augmentation for Low-Resourced Speech Recognition

Low resourced languages suffer from limited training data and resources. Data augmentation is a common approach to increasing the amount of training data. Additional data is synthesized by manipulating the original data with a variety of methods. Unlike most previous work that focuses on a single technique, we combine multiple, complementary augmentation approaches. The first stage adds noise a...

متن کامل

Audio augmentation for speech recognition

Data augmentation is a common strategy adopted to increase the quantity of training data, avoid overfitting and improve robustness of the models. In this paper, we investigate audio-level speech augmentation methods which directly process the raw signal. The method we particularly recommend is to change the speed of the audio signal, producing 3 versions of the original signal with speed factor...

متن کامل

Improving Children's Speech Recognition Through Out-of-Domain Data Augmentation

Children’s speech poses challenges to speech recognition due to strong age-dependent anatomical variations and a lack of large, publicly-available corpora. In this paper we explore data augmentation for children’s speech recognition using stochastic feature mapping (SFM) to transform out-of-domain adult data for both GMM-based and DNN-based acoustic models. We performed experiments on the Engli...

متن کامل

Data augmentation for diffusions

The problem of formal likelihood-based (either classical or Bayesian) inference for discretely observed multi-dimensional diffusions is particularly challenging. In principle this involves data-augmentation of the observation data to give representations of the entire diffusion trajectory. Most currently proposed methodology splits broadly into two classes: either through the discretisation of ...

متن کامل

Spectral clustering for speech separation

Spectral clustering refers to a class of recent techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this chapter, we introduce the main concepts and algorithms together with recent advances in learning the similarity matrix ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Speech Communication

سال: 2023

ISSN: ['1872-7182', '0167-6393']

DOI: https://doi.org/10.1016/j.specom.2023.05.009